139 research outputs found

    Fizzy: feature subset selection for metagenomics

    Get PDF
    BACKGROUND: Some of the current software tools for comparative metagenomics provide ecologists with the ability to investigate and explore bacterial communities using α- & β-diversity. Feature subset selection - a sub-field of machine learning - can also provide a unique insight into the differences between metagenomic or 16S phenotypes. In particular, feature subset selection methods can obtain the operational taxonomic units (OTUs), or functional features, that have a high-level of influence on the condition being studied. For example, in a previous study we have used information-theoretic feature selection to understand the differences between protein family abundances that best discriminate between age groups in the human gut microbiome. RESULTS: We have developed a new Python command line tool, which is compatible with the widely adopted BIOM format, for microbial ecologists that implements information-theoretic subset selection methods for biological data formats. We demonstrate the software tools capabilities on publicly available datasets. CONCLUSIONS: We have made the software implementation of Fizzy available to the public under the GNU GPL license. The standalone implementation can be found at http://github.com/EESI/Fizzy.This item is part of the UA Faculty Publications collection. For more information this item or other items in the UA Campus Repository, contact the University of Arizona Libraries at [email protected]

    NBC update: The addition of viral and fungal databases to the Naïve Bayes classification tool

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Classifying the fungal and viral content of a sample is an important component of analyzing microbial communities in environmental media. Therefore, a method to classify any fragment from these organisms' DNA should be implemented.</p> <p>Results</p> <p>We update the näive Bayes classification (NBC) tool to classify reads originating from viral and fungal organisms. NBC classifies a fungal dataset similarly to Basic Local Alignment Search Tool (BLAST) and the Ribosomal Database Project (RDP) classifier. We also show NBC's similarities and differences to RDP on a fungal large subunit (LSU) ribosomal DNA dataset. For viruses in the training database, strain classification accuracy is 98%, while for those reads originating from sequences not in the database, the order-level accuracy is 78%, where order indicates the taxonomic level in the tree of life.</p> <p>Conclusions</p> <p>In addition to being competitive to other classifiers available, NBC has the potential to handle reads originating from any location in the genome. We recommend using the Bacteria/Archaea, Fungal, and Virus databases separately due to algorithmic biases towards long genomes. The tool is publicly available at: <url>http://nbc.ece.drexel.edu</url>.</p

    Using the RDP Classifier to Predict Taxonomic Novelty and Reduce the Search Space for Finding Novel Organisms

    Get PDF
    BACKGROUND: Currently, the naïve Bayesian classifier provided by the Ribosomal Database Project (RDP) is one of the most widely used tools to classify 16S rRNA sequences, mainly collected from environmental samples. We show that RDP has 97+% assignment accuracy and is fast for 250 bp and longer reads when the read originates from a taxon known to the database. Because most environmental samples will contain organisms from taxa whose 16S rRNA genes have not been previously sequenced, we aim to benchmark how well the RDP classifier and other competing methods can discriminate these novel taxa from known taxa. PRINCIPAL FINDINGS: Because each fragment is assigned a score (containing likelihood or confidence information such as the boostrap score in the RDP classifier), we "train" a threshold to discriminate between novel and known organisms and observe its performance on a test set. The threshold that we determine tends to be conservative (low sensitivity but high specificity) for naïve Bayesian methods. Nonetheless, our method performs better with the RDP classifier than the other methods tested, measured by the f-measure and the area-under-the-curve on the receiver operating characteristic of the test set. By constraining the database to well-represented genera, sensitivity improves 3-15%. Finally, we show that the detector is a good predictor to determine novel abundant taxa (especially for finer levels of taxonomy where novelty is more likely to be present). CONCLUSIONS: We conclude that selecting a read-length appropriate RDP bootstrap score can significantly reduce the search space for identifying novel genera and higher levels in taxonomy. In addition, having a well-represented database significantly improves performance while having genera that are "highly" similar does not make a significant improvement. On a real dataset from an Amazon Terra Preta soil sample, we show that the detector can predict (or correlates to) whether novel sequences will be assigned to new taxa when the RDP database "doubles" in the future

    Cerebrospinal fluid levels of opioid peptides in fibromyalgia and chronic low back pain

    Get PDF
    BACKGROUND: The mechanism(s) of nociceptive dysfunction and potential roles of opioid neurotransmitters are unresolved in the chronic pain syndromes of fibromyalgia and chronic low back pain. METHODS: History and physical examinations, tender point examinations, and questionnaires were used to identify 14 fibromyalgia, 10 chronic low back pain and 6 normal control subjects. Lumbar punctures were performed. Met-enkephalin-Arg(6)-Phe(7 )(MEAP) and nociceptin immunoreactive materials were measured in the cerebrospinal fluid by radioimmunoassays. RESULTS: Fibromyalgia (117.6 pg/ml; 85.9 to 149.4; mean, 95% C.I.; p = 0.009) and low back pain (92.3 pg/ml; 56.9 to 127.7; p = 0.049) groups had significantly higher MEAP than the normal control group (35.7 pg/ml; 15.0 to 56.5). MEAP was inversely correlated to systemic pain thresholds. Nociceptin was not different between groups. Systemic Complaints questionnaire responses were significantly ranked as fibromyalgia > back pain > normal. SF-36 domains demonstrated severe disability for the low back pain group, intermediate results in fibromyalgia, and high function in the normal group. CONCLUSIONS: Fibromyalgia was distinguished by higher cerebrospinal fluid MEAP, systemic complaints, and manual tender points; intermediate SF-36 scores; and lower pain thresholds compared to the low back pain and normal groups. MEAP and systemic pain thresholds were inversely correlated in low back pain subjects. Central nervous system opioid dysfunction may contribute to pain in fibromyalgia

    Candidate Variants in DNA Replication and Repair Genes in Early-Onset Renal Cell Carcinoma Patients Referred for Germline Testing

    Get PDF
    Background: Early-onset renal cell carcinoma (eoRCC) is typically associated with pathogenic germline variants (PGVs) in RCC familial syndrome genes. However, most eoRCC patients lack PGVs in familial RCC genes and their genetic risk remains undefined. Methods: Here, we analyzed biospecimens from 22 eoRCC patients that were seen at our institution for genetic counseling and tested negative for PGVs in RCC familial syndrome genes. Results: Analysis of whole-exome sequencing (WES) data found enrichment of candidate pathogenic germline variants in DNA repair and replication genes, including multiple DNA polymerases. Induction of DNA damage in peripheral blood monocytes (PBMCs) significantly elevated numbers of [Formula: see text]H2AX foci, a marker of double-stranded breaks, in PBMCs from eoRCC patients versus PBMCs from matched cancer-free controls. Knockdown of candidate variant genes in Caki RCC cells increased [Formula: see text]H2AX foci. Immortalized patient-derived B cell lines bearing the candidate variants in DNA polymerase genes (POLD1, POLH, POLE, POLK) had DNA replication defects compared to control cells. Renal tumors carrying these DNA polymerase variants were microsatellite stable but had a high mutational burden. Direct biochemical analysis of the variant Pol δ and Pol η polymerases revealed defective enzymatic activities. Conclusions: Together, these results suggest that constitutional defects in DNA repair underlie a subset of eoRCC cases. Screening patient lymphocytes to identify these defects may provide insight into mechanisms of carcinogenesis in a subset of genetically undefined eoRCCs. Evaluation of DNA repair defects may also provide insight into the cancer initiation mechanisms for subsets of eoRCCs and lay the foundation for targeting DNA repair vulnerabilities in eoRCC

    Prospective screening study of 0.5 Tesla dedicated magnetic resonance imaging for the detection of breast cancer in young, high-risk women

    Get PDF
    BACKGROUND: Evidence-based screening guidelines are needed for women under 40 with a family history of breast cancer, a BRCA1 or BRCA2 mutation, or other risk factors. An accurate assessment of breast cancer risk is required to balance the benefits and risks of surveillance, yet published studies have used narrow risk assessment schemata for enrollment. Breast density limits the sensitivity of film-screen mammography but is not thought to pose a limitation to MRI, however the utility of MRI surveillance has not been specifically examined before in women with dense breasts. Also, all MRI surveillance studies yet reported have used high strength magnets that may not be practical for dedicated imaging in many breast centers. Medium strength 0.5 Tesla MRI may provide an alternative economic option for surveillance. METHODS: We conducted a prospective, nonrandomized pilot study of 30 women age 25–49 years with dense breasts evaluating the addition of 0.5 Tesla MRI to conventional screening. All participants had a high quantitative breast cancer risk, defined as ≥ 3.5% over the next 5 years per the Gail or BRCAPRO models, and/or a known BRCA1 or BRCA2 germline mutation. RESULTS: The average age at enrollment was 41.4 years and the average 5-year risk was 4.8%. Twenty-two subjects had BIRADS category 1 or 2 breast MRIs (negative or probably benign), whereas no category 4 or 5 MRIs (possibly or probably malignant) were observed. Eight subjects had BIRADS 3 results, identifying lesions that were "probably benign", yet prompting further evaluation. One of these subjects was diagnosed with a stage T1aN0M0 invasive ductal carcinoma, and later determined to be a BRCA1 mutation carrier. CONCLUSION: Using medium-strength MRI we were able to detect 1 early breast tumor that was mammographically undetectable among 30 young high-risk women with dense breasts. These results support the concept that breast MRI can enhance surveillance for young high-risk women with dense breasts, and further suggest that a medium-strength instrument is sufficient for this application. For the first time, we demonstrate the use of quantitative breast cancer risk assessment via a combination of the Gail and BRCAPRO models for enrollment in a screening trial
    corecore